cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Akanksha533
by New Contributor
  • 2170 Views
  • 4 replies
  • 3 kudos
  • 2170 Views
  • 4 replies
  • 3 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 3 kudos

Hi @Akanksha Kumari​ , We haven’t heard from you on the last response from @Mark Ferguson​ and @Hubert Dudek​, and I was checking back to see if their suggestions helped you. Or else, If you have any solution, please do share that with the community ...

  • 3 kudos
3 More Replies
AK032716
by New Contributor
  • 2539 Views
  • 3 replies
  • 2 kudos

implement autoloader to ingest data into delta lake, i have 100 different tables with full load , append merge senarios

i want to implement autoloader to ingest data into delta lake from 5 different source systems and i have 100 different tables in each database how do we dynamically address this by using autoloader , trigger once option - full load , append merge sen...

  • 2539 Views
  • 3 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @Anil Kovilakar​(Customer)​ , We haven’t heard from you on the last response from @Daniel Sahal​ ​ and @Jordan Fox​, and I was checking back to see if their suggestions helped you. Or else, If you have any solution, please do share that with the c...

  • 2 kudos
2 More Replies
jamesw
by New Contributor II
  • 1939 Views
  • 2 replies
  • 1 kudos

Ganglia not working with custom container services

Setup:custom docker container starting from the "databricksruntime/gpu-conda:cuda11" base image layer10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12)multi-node, p3.8xlarge GPU computeWhen I try to view Ganglia metrics I am met with "502 Bad Gatewa...

image.png image
  • 1939 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @James W​ (Customer)​ , We haven’t heard from you since the last response from @Vivian Wilfred​, and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please do share that with the community as it can be ...

  • 1 kudos
1 More Replies
KVNARK
by Honored Contributor II
  • 1937 Views
  • 4 replies
  • 9 kudos

one of the date datatype format issue in pysaprk

if anyone has encountered this date type format - 6/15/25 12:00 AM could you mention the right formatting to be used in Pyspark.Thanks in advance!

  • 1937 Views
  • 4 replies
  • 9 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 9 kudos

Hi @KVNARK .​, We haven’t heard from you since the last response from @Uma Maheswara Rao Desula​ and @Hubert Dudek​ I was checking back to see if his suggestions helped you. Or else, If you have any solution, please do share that with the community a...

  • 9 kudos
3 More Replies
Etyr
by Contributor
  • 4890 Views
  • 4 replies
  • 2 kudos

Resolved! slow Fetching results by client in databricks SQL calling from Azure Compute Instance (AML)

I'm using `databricks-sql-connector` in python3.8 to connect to an Azure SQL Wharehouse inside an Azure Machine Learning Compute Instance.I have this large result query, looking at the `query history` I check the time spent on doing the query, and se...

first_time_query
  • 4890 Views
  • 4 replies
  • 2 kudos
Latest Reply
Etyr
Contributor
  • 2 kudos

So I made some few tests. Since you said that the Databricks SQL driver wasn't made to retrieve that amount of data. I went on Spark.I fired up a small spark cluster, the query was as fast as on SQL Warehouse, then I did a df.write.parquet("/my_path/...

  • 2 kudos
3 More Replies
dheeraj2444
by New Contributor II
  • 1712 Views
  • 4 replies
  • 0 kudos

I am trying to write a data frame to Kafka topic with Avro schema for key and value using a schema registry URL. The to_avro function is not writing t...

I am trying to write a data frame to Kafka topic with Avro schema for key and value using a schema registry URL. The to_avro function is not writing to the topic and throwing an exception with code 40403 something. Is there an alternate way to do thi...

  • 1712 Views
  • 4 replies
  • 0 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 0 kudos

Hi,Could you please refer to https://github.com/confluentinc/kafka-connect-elasticsearch/issues/59 and let us know if this helps.

  • 0 kudos
3 More Replies
Cano
by New Contributor III
  • 2410 Views
  • 5 replies
  • 2 kudos

SQL warehouse failing to start ( Please check network connectivity from the data plane to the control plane )

Hi, My SQL warehouse is failing to start with the following error message:Details for the latest failure: Error: [id: InstanceId(i-01b84b6705ff09104), status: INSTANCE_INITIALIZING, workerEnvId:WorkerEnvId(workerenv-3023557811934763-c8cef827-a038-455...

  • 2410 Views
  • 5 replies
  • 2 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 2 kudos

Hi, There is a line in the attached logs as below:[Bootstrap Event] Can reach ohio.cloud.databricks.com: [FAILED][Bootstrap Event] DNS output for databricks-prod-artifacts-us-east-2.s3.us-east-2.amazonaws.com: Server: 10.187.0.2Address: 10.187.0.2#5...

  • 2 kudos
4 More Replies
Mahesh777k
by New Contributor
  • 1832 Views
  • 3 replies
  • 2 kudos

How to delete duplicate tables?

Hi Everyone,Accidently imported duplicate tables, guide me how to delete themusing data bricks community edition  

image
  • 1832 Views
  • 3 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @Mahesh Babu Uppala​ (Customer)​ , We haven’t heard from you on the last response from @Uma Maheswara Rao Desula​ ​ and @Ratna Chaitanya Raju Bandaru​ ​, and I was checking back to see if their suggestions helped you. Or else, If you have any solu...

  • 2 kudos
2 More Replies
databicky
by Contributor II
  • 3321 Views
  • 7 replies
  • 8 kudos

Resolved! How can we move the excel file from adls to share point

I have one excel file in the adls, i want to move that file into sharepoint, but i tried this method in data factory, but in sink sharepoint is not available, is there any possible way to do this?​

  • 3321 Views
  • 7 replies
  • 8 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 8 kudos

Hi @Mohammed sadamusean​(Customer)​ , We haven’t heard from you on the last response from @Daniel Sahal​ ​ and @KVNARK .​, and I was checking back to see if their suggestions helped you. Or else, If you have any solution, please do share that with th...

  • 8 kudos
6 More Replies
SeliLi_52097
by New Contributor III
  • 2328 Views
  • 5 replies
  • 5 kudos

Databricks Academy webpage showing insecure connection (in Chrome)

When I was trying to visit the Databricks Academy website https://customer-academy.databricks.com, it showed insecure connection as below.This happened at 8 January 2023 (AEDT) around 12:30pm.

Screen Shot 2023-01-08 at 12.15.54 pm
  • 2328 Views
  • 5 replies
  • 5 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 5 kudos

Hi @Selina Li​, Thank you for reaching out!Let us look into this for you, and we'll circle back with an update.

  • 5 kudos
4 More Replies
prashantp
by New Contributor III
  • 1749 Views
  • 2 replies
  • 1 kudos

Resolved! CloudFormation failure while creating workspace

The cloud formation script is fails to create workspace due to following error. The solution to this error is not available in the standard troubleshooting guide. Following is the error message from cloud watch logs. It seems like a bug in the api, b...

  • 1749 Views
  • 2 replies
  • 1 kudos
Latest Reply
prashantp
New Contributor III
  • 1 kudos

I found a solution to this by creating a new role following these articles -https://docs.databricks.com/administration-guide/account-api/iam-role.htmlhttps://docs.databricks.com/administration-guide/account-api/aws-storage.html

  • 1 kudos
1 More Replies
Tacuma
by New Contributor II
  • 1272 Views
  • 4 replies
  • 1 kudos

Scheduling jobs with Airflow result in each task running multiple jobs.

Hey everyone, I'm experiementing with running containerized pyspark jobs in Databricks, and orchestrating them with airflow. I am however, encountering an issue here. When I trigger an airflow DAG, and I look at the logs, I see that airflow is spinni...

  • 1272 Views
  • 4 replies
  • 1 kudos
Latest Reply
Tacuma
New Contributor II
  • 1 kudos

Both, I guess? Yes, all jobs share the same config - the question I have is why in the same airflow task log, there are 3 jobs runs. I'm hoping that there's something in the configs and may give me some kind of clue.

  • 1 kudos
3 More Replies
databicky
by Contributor II
  • 1646 Views
  • 3 replies
  • 0 kudos

how to check the particular column value in spark dataframe ?

if i want​ to check the the particular column in dataframe is need to contain zero, if its not have zero means , it need to get fail

  • 1646 Views
  • 3 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Mohammed sadamusean​  (Customer)​, We haven’t heard from you since the last response from @Mateusz Łomański​ (Customer)​ , and I was checking back to see if his suggestions helped you.Or else, If you have any solution, please share it with the co...

  • 0 kudos
2 More Replies
Jennifer
by New Contributor III
  • 1868 Views
  • 1 replies
  • 0 kudos

How do I update an aggregate table using a Delta live table

I have am using delta live tables to stream events and I have a raw table for all the events and a downstream aggregate table. I need to add the new aggregated number to the downstream table aggregate column. But I didn't find any recipe talking abou...

  • 1868 Views
  • 1 replies
  • 0 kudos
Latest Reply
Jennifer
New Contributor III
  • 0 kudos

Maybe my code is correct already since I use dlt.read("my_raw_table") instead of delta.read_stream("my_raw_table"). So the col_aggr is recalculated completely every time my_raw_table is updated.

  • 0 kudos
pasiasty2077
by New Contributor
  • 3567 Views
  • 2 replies
  • 1 kudos

Partition filter is skipped when table is used in where condition, why?

Hi,maybe someone can help me i do want to run very narrow query SELECT * FROM my_table WHERE snapshot_date IN('2023-01-06', '2023-01-07')   -- part of the physical plan: -- Location: PreparedDeltaFileIndex [dbfs:/...] -- PartitionFilters: [cast(snaps...

  • 3567 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Mariusz J​, We haven’t heard from you since the last response from @Werner Stinckens​, and I was checking back to see if his suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to oth...

  • 1 kudos
1 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels