cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

CHANDAN_NANDY
by New Contributor III
  • 4707 Views
  • 2 replies
  • 4 kudos

Resolved! GitCopilot Support

Any Idea why GitCopiot is not available in Azure Databricks, though it supports Github? 

  • 4707 Views
  • 2 replies
  • 4 kudos
Latest Reply
nightcoder
New Contributor II
  • 4 kudos

That is true (this is not an answer but a comment) vscode is supported. But vscode is not integrating with notebook on aws. When will this feature be available?

  • 4 kudos
1 More Replies
brickster_2018
by Databricks Employee
  • 2198 Views
  • 3 replies
  • 0 kudos

Resolved! For the Autoloader, cloudFiles.includeExistingFiles option, is ordering respected?

If Yes, how is order ensured?  For example, let's say there are a number of CDC change files that are uploaded to a directory over time. If a table were to be created using the cloudFiles source, in what order would those files be processed?

  • 2198 Views
  • 3 replies
  • 0 kudos
Latest Reply
Hanish_Goel
New Contributor II
  • 0 kudos

Hi, Is there any new development in terms of ensuring ordering of the files in autoloader?

  • 0 kudos
2 More Replies
vk217
by Contributor
  • 14001 Views
  • 5 replies
  • 17 kudos

Resolved! python wheel cannot be installed as library.

When I try to install the python whl library, I get the below error. However I can install it as a jar and it works fine. One difference is that I am creating my own cluster by cloning an existing cluster and copying the whl to a folder called testin...

image
  • 14001 Views
  • 5 replies
  • 17 kudos
Latest Reply
vk217
Contributor
  • 17 kudos

The issue was that the package was renamed after it was installed to the cluster and hence it was not recognized.

  • 17 kudos
4 More Replies
140015
by New Contributor III
  • 703 Views
  • 0 replies
  • 0 kudos

DLT using the result of one view in another table with collect()

Hey,Do you guys know, if there is an option to implement something like this in DLT:@dlt.view()def view_1(): # some calculations that return a small dataframe with around max 80 rows@dlt.table()def table_1(): result_df = dlt.read("view_1") resu...

  • 703 Views
  • 0 replies
  • 0 kudos
ACP
by New Contributor III
  • 2383 Views
  • 4 replies
  • 2 kudos

Accreditation, Badges, Points not received

Hi there​ ,I have completed a few courses but didn't receive any badges or points. I also did an accreditation but also didn't receive anything. It's been already 3 or 4 days and still nothing.I would really appreciate if Databricks could fix this.Ma...

  • 2383 Views
  • 4 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Andre Paiva​ Thank you for reaching out! Please submit a ticket to our Training Team here: https://help.databricks.com/s/contact-us?ReqType=training  and our team will get back to you shortly. 

  • 2 kudos
3 More Replies
KVNARK
by Honored Contributor II
  • 7218 Views
  • 3 replies
  • 8 kudos

Resolved! Advantages of Databricks Lakehouse over Azure synapse.

What are more advantages of data bricks over azure synapse analytics. Looks like most of them are almost similar features like computation or storage etc... in both.

  • 7218 Views
  • 3 replies
  • 8 kudos
Latest Reply
Geeta1
Valued Contributor
  • 8 kudos

Below link has good comparison of both:https://hevodata.com/learn/azure-synapse-vs-databricks/

  • 8 kudos
2 More Replies
raman
by New Contributor II
  • 1180 Views
  • 2 replies
  • 0 kudos

Spark pushdown filter not being respected on dbfs

I have a parquet files with a column g1 with schemaStructField(g1,IntegerType,true)Now I have a query with filter on g1.What's weird in the SQL viewer is that spark is loading all the rows from that file. Even though in the physical plan I can see th...

  • 1180 Views
  • 2 replies
  • 0 kudos
Latest Reply
raman
New Contributor II
  • 0 kudos

Thanks @Ajay Pandey​ pls find attached the physical plan.Query: Select identityMap, segmentMembership, _repo, workEmail, person, homePhone, workPhone, workAddress, personalEmail, homeAddress from final_segment_index_table_v2 where (g1 >= 128 AND g1 <...

  • 0 kudos
1 More Replies
Kopal
by New Contributor II
  • 5808 Views
  • 3 replies
  • 3 kudos

Resolved! Data Engineering - CTAS - External Tables - Limitations of CTAS for external tables - can or cannot use options and location

Data Engineering - CTAS - External TablesCan someone help me understand why In chapter 3.3, we cannot not directly use CTAS with OPTIONS and LOCATION to specify delimiter and location of CSV?Or I misunderstood?Details:In Data Engineering with Databri...

  • 5808 Views
  • 3 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

The 2nd statement CTAS will not be able to parse the csv in any manner because it's just the from statement that points to a file. It's more of a traditional SQL statement with select and from. It will create a Delta Table. This just happens to b...

  • 3 kudos
2 More Replies
jt
by New Contributor III
  • 2852 Views
  • 2 replies
  • 2 kudos

SQL table alias autocomplete

I have a table with 600 columns and the table name is long. I want to use a table alias with autocomplete but it's not working. Any ideas how I can get this to work? works%sql   --autocomplete works SELECT verylongtablename.column200 verylongtabl...

  • 2852 Views
  • 2 replies
  • 2 kudos
Latest Reply
jt
New Contributor III
  • 2 kudos

My cluster is running fine. Does autocomplete work for you with a table alias?

  • 2 kudos
1 More Replies
jt
by New Contributor III
  • 2269 Views
  • 2 replies
  • 1 kudos

Table of Content consistency

When I click on header "STEP 3" in the table of contents, it takes me to the correct section.  However, when I click on "STEP 2" - the table of contents stays on "STEP 3". This sometime causes confusion. For consistency, is there any way to highligh...

image001 image002
  • 2269 Views
  • 2 replies
  • 1 kudos
Latest Reply
jt
New Contributor III
  • 1 kudos

If you click on cell "Command-4", does the table of contact (on the left) highlight "Command-4"?

  • 1 kudos
1 More Replies
brickster_2018
by Databricks Employee
  • 1933 Views
  • 1 replies
  • 2 kudos
  • 1933 Views
  • 1 replies
  • 2 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 2 kudos

because your driver is not able to talk with your nodes for this you can add configuration where you can increase databricks heartbeat interval and you can also add rpc max size this will also help.you can explore cluster configuration from here- htt...

  • 2 kudos
spott_submittab
by New Contributor II
  • 1085 Views
  • 1 replies
  • 0 kudos

A Job "pool"? (or task pool)

I'm trying to run a single job multiple times with different parameters where the number of concurrent jobs is less than the number of parameters.I have a job (or task...) J that takes parameter set p, I have 100 p values I want to run, however I onl...

  • 1085 Views
  • 1 replies
  • 0 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 0 kudos

this is something new,interesting question, try to reach out databricks support team, maybe they have some good idea here

  • 0 kudos
aka1
by New Contributor II
  • 2825 Views
  • 1 replies
  • 3 kudos

dbx - run unit test error (java.lang.NoSuchMethodError)

I am setting up dbx for the fist time on Windows 10. Strictly following https://dbx.readthedocs.io/en/latest/guides/python/python_quickstart/openjdk is installed conda install -c conda-forge openjdk=11.0.15winutils.exe for Hadoop 3 is downloaded, pat...

image.png image image
  • 2825 Views
  • 1 replies
  • 3 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 3 kudos

this seems code issue only

  • 3 kudos
MaximS
by New Contributor
  • 1521 Views
  • 1 replies
  • 1 kudos

OPTIMIZE command failed to complete on partitioned dataset

Trying to optimize delta table with following stats:size: 212,848 blobs, 31,162,417,246,985 bytescommand: OPTIMIZE <table> ZORDER BY (X, Y, Z)In Spark UI I can see all work divided to batches, and each batch start with 400 tasks to collect data. But ...

  • 1521 Views
  • 1 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

can you share some sample datasets for this by that we can debug and help you accordingly ThanksAviral

  • 1 kudos
auser85
by New Contributor III
  • 3975 Views
  • 1 replies
  • 1 kudos

How to incorporate these GC options into my Databricks Cluster? )(spark.executor.extraJavaOptions)

I want to try incorporating these options into my databricks cluster.spark.driver.extraJavaOptions -XX:+UseG1GC -XX:+G1SummarizeConcMark spark.executor.extraJavaOptions -XX:+UseG1GC -XX:+G1SummarizeConcMarkIf I put them under Compute -> Cluster -> Co...

  • 3975 Views
  • 1 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

hey @Andrew Fogarty​ , I think this is only for the spark-submit command, not for cluster UI.Please have a look at this doc - http://progexc.blogspot.com/2014/12/spark-configuration-mess-solved.htmlspark.executor.extraJavaOptionsA string of extra JVM...

  • 1 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels