cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

mimezzz
by Contributor
  • 4406 Views
  • 8 replies
  • 10 kudos

Resolved! Dataframe rows missing after write_to_delta and read_from_delta

Hi, i am trying to load mongo into s3 using pyspark 3.1.1 by reading them into a parquet. My code snippets are like:df = spark \ .read \ .format("mongo") \ .options(**read_options) \ .load(schema=schema)df = df.coalesce(64)write_df_to_del...

  • 4406 Views
  • 8 replies
  • 10 kudos
Latest Reply
mimezzz
Contributor
  • 10 kudos

So i think i have solved the mystery here it was to do with the retention config. By setting the retentionEnabled to True and rention hours being 0, we somewhat loses a few rows in the first file as they were mistaken as files from last session and ...

  • 10 kudos
7 More Replies
prem0305
by New Contributor
  • 562 Views
  • 1 replies
  • 0 kudos

I am not able to login with my credentials. This is happening with me again and again.i have created different account then also,i am facing the same ...

I am not able to login with my credentials. This is happening with me again and again.i have created different account then also,i am facing the same proble..please help me to resolve this issue...i am a new learner here

  • 562 Views
  • 1 replies
  • 0 kudos
Latest Reply
Chaitanya_Raju
Honored Contributor
  • 0 kudos

Hi @PREM RANJAN​ It might be a temporary issue, for any issue with Academy learning/certifications, you can raise a ticket in the below link, sharing it with you for your future reference as well.https://help.databricks.com/s/contact-us?ReqType=train...

  • 0 kudos
ivanychev
by Contributor
  • 1063 Views
  • 2 replies
  • 0 kudos

Resolved! When Databricks on AWS will support c6i/m6i/r6i EC2 instance types?

The instances are almost 1.5 years old now and provide better efficiency that the 5 gen.

  • 1063 Views
  • 2 replies
  • 0 kudos
Latest Reply
LandanG
Honored Contributor
  • 0 kudos

@Sergey Ivanychev​ those instance types are under development and should be GA very soon. No official date AFAIK

  • 0 kudos
1 More Replies
labromb
by Contributor
  • 3320 Views
  • 7 replies
  • 6 kudos

Databricks Jobs and CICD

Hi, We currently leverage Azure DevOps to source control our notebooks and use CICD to publish the notebooks to different environments and this works very well. We do not have the same functionality available for Databricks jobs (the ability to sourc...

  • 3320 Views
  • 7 replies
  • 6 kudos
Latest Reply
JRT5933
New Contributor III
  • 6 kudos

My team is currently looking at establishing REPO(s) for source control to start. I know I've seen some documentation for when a MERGE is completed to auto update the main branch in DBX remote repo. Does annyone have a template and/or best practices ...

  • 6 kudos
6 More Replies
Ullsokk
by New Contributor III
  • 1967 Views
  • 4 replies
  • 0 kudos

Running notebook from another notebook does not work when running notebook from github actions

I have a setup-notebook that users %run to run a series of notebooks. The notebook is in the root folder of my repo. In a subfolder I have several notebooks I want to run. If I run the notebook in databricks, the relative paths work (%run "./subfolde...

  • 1967 Views
  • 4 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

It seems that the directory is different under GitHub actions. Not sure what exactly mentioned GitHub actions do and which one it is. Maybe you can share GitHub action used.

  • 0 kudos
3 More Replies
SrinMand_34861
by New Contributor II
  • 1436 Views
  • 4 replies
  • 1 kudos

Passing the secret scope to the url

We are trying to call an URL by using the credentials, we are able to get the data when we hard code the credentials.Not returning any data when we pass the secret scope credentials.below is the code.import requestssource_db_scope = "dev-hnd-secret-s...

  • 1436 Views
  • 4 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

please try to debug what secret scope is returning. The ugly way to do it is:for letter in username: print(letter, ' ')

  • 1 kudos
3 More Replies
jonathan-dufaul
by Valued Contributor
  • 1466 Views
  • 2 replies
  • 2 kudos

Resolved! Does anyone have a single example of a graphframe with two+ types of vertices? (e.g. user and post, not user to user)

I have gone through about 75 pages and every single example has only relationships from one type of object to the same type of object. about 90% have the exact same example of "Alice Bob" "friends."Has anyone ever made a graphframe with two types of ...

  • 1466 Views
  • 2 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

I feel your pain,I once tried to use graphframes to flatten a complex tree, ended up using graphX (which is even worse to use but at least it is more flexible).So maybe take a look at graphX? Beware, it is terrible to use.I wonder what happened to m...

  • 2 kudos
1 More Replies
riccostamendes
by New Contributor II
  • 2535 Views
  • 2 replies
  • 0 kudos

Just a doubt, can we develop a kedro project in databricks?

I am asking this because up to now I have just seen some examples of deploying a pre-existent kedro project in databricks in order to run some pipelines...

  • 2535 Views
  • 2 replies
  • 0 kudos
Latest Reply
riccostamendes
New Contributor II
  • 0 kudos

yes, you can deploy a pre-existent kedro project in databricks, but as far as I know you cannot create it. you have to create it somewhere else and then deploy it in db.

  • 0 kudos
1 More Replies
alwinsa
by New Contributor III
  • 3393 Views
  • 3 replies
  • 8 kudos

Data type not shown correctly in SQL editor

When selecting from a table in the SQL editor it doesn't always preview the actual data type that the column is. e.g. I have a decimal() data type in one of my tables and when I select it it previews to a float with 2 decimals (which is different fro...

  • 3393 Views
  • 3 replies
  • 8 kudos
Latest Reply
alwinsa
New Contributor III
  • 8 kudos

Hey thanks for your response!That definitely seems like what's happening! I'm new to Databricks -- where can I find that editor?So my problem was actually two-pronged but I only outlined part of the problem above, which you seem to have solved!The ot...

  • 8 kudos
2 More Replies
andrew0117
by Contributor
  • 15957 Views
  • 2 replies
  • 0 kudos

Resolved! how to read delta table from the path?

an unmanaged delta table is dropped and the real data still there. Now I'm trying to rebuild it, but don't know the schema. So, I tried: val myTable = DeltaTable.forPath("myPath"). But how can I get the data or schema out from myTable?Thanks!

  • 15957 Views
  • 2 replies
  • 0 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 0 kudos

@andrew li​ df = spark.read.format("delta").load("/file_path") df.printSchema()

  • 0 kudos
1 More Replies
Raghu101
by New Contributor III
  • 11760 Views
  • 6 replies
  • 8 kudos

Resolved! Databricks to Oracle

How to write data from Databricks SQL to Oracle DB

  • 11760 Views
  • 6 replies
  • 8 kudos
Latest Reply
ramravi
Contributor II
  • 8 kudos

we can use JDBC driver to write dataframe to Oracle tables. Every database will use jdbc connect to connect & access database. You can follow same process for connecting to any database.Download Oracle ojdbc6.jar JDBC DriverYou need an Oracle jdbc dr...

  • 8 kudos
5 More Replies
cmilligan
by Contributor II
  • 882 Views
  • 1 replies
  • 1 kudos

Resolved! Database CICD Pipelines

My team has a shared codebase and we are running into issues as we migrate to Databricks when two people are doing development on connected sections of our codebase.For example if I add a column to a table for changes on my branch, other members on m...

  • 882 Views
  • 1 replies
  • 1 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 1 kudos

@Coleman Milligan​ It's really hard to create something like this without basic knowledge about how CICD should work or even Terraform.You can start here, to understand some basics.https://servian.dev/how-to-hardening-azure-databricks-using-terraform...

  • 1 kudos
kilaki
by New Contributor II
  • 2531 Views
  • 3 replies
  • 0 kudos

Query fails with 'Error occurred while deserializing arrow data' on Databricks SQL with Channel set to Preview

Noticed with a query based on inline select and joins fails to the client with 'Error occurred while deserializing arrow data'  I.e the query succeeds on Databricks but client (DBeaver, AtScale) receives an errorThe error is only noticed with Databri...

Screen Shot 2023-01-24 at 2.08.54 PM Screen Shot 2023-01-24 at 2.11.20 PM Screen Shot 2023-01-24 at 2.03.21 PM
  • 2531 Views
  • 3 replies
  • 0 kudos
Latest Reply
franco_patano
New Contributor III
  • 0 kudos

Opened an ES on this, looks like an issue with the Preview channel. Thanks for your help!

  • 0 kudos
2 More Replies
rakeshprasad1
by New Contributor III
  • 2274 Views
  • 3 replies
  • 4 kudos

databricks autoloader not updating table immediately

I have a simple autoloader job which looks like thisdf_dwu_limit = spark.readStream.format("cloudFiles") \ .option("cloudFiles.format", "JSON") \ .schema(schemaFromJson) \ .load("abfss://synapse-usage@xxxxx.dfs.core.windows.net/synapse-us...

auto-loader issue
  • 2274 Views
  • 3 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 4 kudos

Can you share the whole code with the counts, which you mentioned?

  • 4 kudos
2 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels