cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

brokeTechBro
by New Contributor II
  • 1463 Views
  • 2 replies
  • 0 kudos

Bug Community Edition Sign Up Error

Please help hereBug Community Edition Sign Up Error - an error occurred please try again laterI am frustrated

  • 1463 Views
  • 2 replies
  • 0 kudos
Latest Reply
GSam
New Contributor II
  • 0 kudos

@gchandra The issue is still there. Tried it on multiple browsers (Incognito and otherwise) and on multiple devicces in different networks. Still unable to sign up after 2 days of trying.

  • 0 kudos
1 More Replies
Miasu
by New Contributor II
  • 5556 Views
  • 2 replies
  • 0 kudos

Unable to analyze external table | FileAlreadyExistsException

Hello experts, There's a csv file, "nyc_taxi.csv" saved under users/myfolder on DBFS, and I used this file created 2 tables:1. nyc_taxi : created using the UI, and it appeared as a managed table saved under dbfs:/user/hive/warehouse/mydatabase.db/nyc...

  • 5556 Views
  • 2 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Did you initially want to create an external or managed table?  Just trying to understand what was your intent for the file.

  • 0 kudos
1 More Replies
RantoB
by Valued Contributor
  • 32089 Views
  • 8 replies
  • 7 kudos

Resolved! read csv directly from url with pyspark

I would like to load a csv file directly to a spark dataframe in Databricks. I tried the following code :url = "https://opendata.reseaux-energies.fr/explore/dataset/eco2mix-national-tr/download/?format=csv&timezone=Europe/Berlin&lang=fr&use_labels_fo...

  • 32089 Views
  • 8 replies
  • 7 kudos
Latest Reply
anwangari
New Contributor II
  • 7 kudos

Hello it's end of 2024 and I still have this issue with python. As mentioned sc method nolonger works. Also, working with volumes within "/databricks/driver/" is not supported in Apache Spark.ALTERNATIVE SOLUTION: Use requests to download the file fr...

  • 7 kudos
7 More Replies
abaghel
by New Contributor II
  • 1611 Views
  • 2 replies
  • 0 kudos

Azure application insights logging not working after upgrading cluster to databricks runtime 14.x

I have a basic code setup to read a stream from a Delta table and write it into another Delta table. I am using logging to send logs to Application Insights. However, within the foreachBatch function, the logs I write are not being sent to Applicatio...

  • 1611 Views
  • 2 replies
  • 0 kudos
Latest Reply
abaghel
New Contributor II
  • 0 kudos

@MuthuLakshmi  Thank you for getting back to me. I have read the article and understand that "Any files, modules, or objects referenced in the function must be serializable and available on Spark." However, based on the code provided, can you help me...

  • 0 kudos
1 More Replies
None123
by New Contributor III
  • 12562 Views
  • 3 replies
  • 3 kudos

Open a Support Ticket

Anyone know how to submit a support ticket? I keep getting into a loop that takes me back to the community page, but I need to submit an urgent ticket. I'm told our company pays a ridiculous sum for this feature yet it is impossible to find.Thanks ...

  • 12562 Views
  • 3 replies
  • 3 kudos
Latest Reply
vickytscv
New Contributor II
  • 3 kudos

Hi Team,     We are working with Adobe tool for campaign metrics. which needs to pull data from AEP using explode option, when we pass query it is taking long time and performance is also very. Is there any better way to pull data from AEP, Please le...

  • 3 kudos
2 More Replies
cltj
by New Contributor III
  • 14344 Views
  • 5 replies
  • 2 kudos

Experiences using managed tables

We are looking into the use of managed tables on databricks. As this decision won’t be easy to reverse I am reaching out to all of you fine folks to learn more about your experience with using this.If I understand correctly we dont have to deal with ...

  • 14344 Views
  • 5 replies
  • 2 kudos
Latest Reply
JimmyEatBrick
Databricks Employee
  • 2 kudos

Databricks recommends to ALWAYS use Managed Tables always UNLESS:Your tables are not DeltaYou explicitly need to have the table files in a specific location Managed Tables are just better... Databricks manages:the upgrades (Deletion Vectors? Column M...

  • 2 kudos
4 More Replies
PaoloF
by New Contributor II
  • 934 Views
  • 1 replies
  • 0 kudos

Lakehouse Federation roadmap

Hi all, there is a roadmap to increase the numbers of sources available for the lakehouse federation?I’m interested to know if and when it is possible create a foreign catalog with MariaDB.Thanks

  • 934 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hello @PaoloF, As of now there is no ETA on adding MariaDB as lakehouse federation. I will raise an internal feature request for this source to be consider for its implementation. As far of the roadmap it might need to be followed up with the account...

  • 0 kudos
ImranA
by Contributor
  • 3032 Views
  • 5 replies
  • 3 kudos

Resolved! Schema issue when dropping a delta live table

For example there is a table called "cars", if I remove the table from DLT pipeline and drop the table from catalogue. Now if I change the schema of the table, and create the table again using the same table name "cars" through the same pipeline, Why...

  • 3032 Views
  • 5 replies
  • 3 kudos
Latest Reply
gchandra
Databricks Employee
  • 3 kudos

https://docs.databricks.com/en/ingestion/cloud-object-storage/auto-loader/schema.html#how-does-auto-loader-schema-evolution-work https://docs.databricks.com/en/ingestion/cloud-object-storage/auto-loader/patterns.html#enable-easy-etl

  • 3 kudos
4 More Replies
charl-p-botha
by Databricks Partner
  • 1318 Views
  • 1 replies
  • 1 kudos

Thank you for the "setting up tables" speed-up from dlt release 2024.42.rc0 to 2024.44.rc1

Dear DataBricks people,We are currently measuring DLT performance and cost on a medallion architecture with 150 to 300 tables, and we're interested in adding even more tables.I've been doing automated incremental streaming DLT pipelines every 3 hours...

  • 1318 Views
  • 1 replies
  • 1 kudos
Latest Reply
VZLA
Databricks Employee
  • 1 kudos

Hi @charl-p-botha Thanks for your feedback ! Thank you for sharing your experience with Delta Live Tables (DLT) performance improvements. We are pleased to hear that the upgrade to dlt:15.4.4-delta-pipelines-dlt-release-2024.44-rc1-commit-1a62345-ima...

  • 1 kudos
Subhrajyoti
by Databricks Partner
  • 4429 Views
  • 1 replies
  • 0 kudos

Deriving a relation between spark job and underlying code

For one of our requirement, we need to derive a relation between spark job, stage ,task id with the underlying code executed after a workflow job is getting triggered using a job cluster. So far we are able to develop a relation between the Workflow ...

  • 4429 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

Hi @Subhrajyoti thanks for your question! I'm not sure if you have tried this already, but by combining listener logs with structured tabular data, you can create a clear mapping between Spark job executions and the corresponding notebook code. You c...

  • 0 kudos
OldManCoder
by New Contributor II
  • 1624 Views
  • 2 replies
  • 2 kudos

Resolved! Should Vacuum Be Tied to Workflows?

I have a process expected to run every two weeks. Throughout the process (~30 notebooks), when I write to a table for the last time in the overall process, I run my vacuum such as below - I'm never running a vac against the same table twice.  I've no...

  • 1624 Views
  • 2 replies
  • 2 kudos
Latest Reply
VZLA
Databricks Employee
  • 2 kudos

Hi @OldManCoder , thanks for your question! 1) Yes, separating cleanup tasks into a dedicated workflow is often more efficient. Here's why: Performance: Vacuum and optimization are resource-intensive operations. Running them inline with your primary ...

  • 2 kudos
1 More Replies
huytran
by New Contributor III
  • 7020 Views
  • 7 replies
  • 0 kudos

Cannot write data to BigQuery when using Databricks secret

I am following this guide on writing data to the BigQuery table.Right now, I have an error when I try to write data using Databricks Secret instead of the JSON credential file and setting the GOOGLE_APPLICATION_CREDENTIALS environment variable. java....

  • 7020 Views
  • 7 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

It seems that nothing is being loaded into the GOOGLE_APPLICATION_CREDENTIALS. From https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/master/gcs/INSTALL.md   # The JSON keyfile of the service account used for GCS # access when google.clou...

  • 0 kudos
6 More Replies
Vetrivel
by Databricks Partner
  • 2638 Views
  • 4 replies
  • 0 kudos

PowerBI performance with Databricks

We have integrated PowerBI with Databricks to generate reports. However, PowerBI generates over 8,000 lines of code, including numerous OR clauses, which cannot be modified at this time. This results in queries that take more than 4 minutes to execut...

  • 2638 Views
  • 4 replies
  • 0 kudos
Latest Reply
Vetrivel
Databricks Partner
  • 0 kudos

Attached is the sample query generated by Power BI. Without the OR conditions the query runs within seconds.

  • 0 kudos
3 More Replies
TamD
by Contributor
  • 9042 Views
  • 7 replies
  • 2 kudos

How do I drop a delta live table?

I'm a newbie and I've just done the "Run your first Delta Live Tables pipeline" tutorial.The tutorial downloads a publicly available csv baby names file and creates two new Delta Live tables from it.  Now I want to be a good dev and clean up the reso...

  • 9042 Views
  • 7 replies
  • 2 kudos
Latest Reply
ImranA
Contributor
  • 2 kudos

@gchandra for example a table called "cars", if I remove the table from DLT pipeline and drop the table from catalog. Now if I change the schema of the table, and create the table again using the same table name "cars" through the same pipeline, Why ...

  • 2 kudos
6 More Replies
furkancelik
by New Contributor II
  • 5069 Views
  • 3 replies
  • 1 kudos

How to use Databricks Unity Catalog as metastore for a local spark session

Hello,I would like to access Databricks Unity Catalog from a Spark session created outside the Databricks environment. Previously, I used Hive metastore and didn’t face any issues connecting in this way. Now, I’ve switched the metastore to Unity Cata...

  • 5069 Views
  • 3 replies
  • 1 kudos
Latest Reply
VZLA
Databricks Employee
  • 1 kudos

@furkancelik Glad it helps. I just found this article which I believe will clarify many of your doubts. Please refer straight to the "Accessing Databricks UC from the PySpark shell". Notice the "unity" in the configuration strings will be your UC Def...

  • 1 kudos
2 More Replies
Labels