cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

mkd
by New Contributor II
  • 4863 Views
  • 4 replies
  • 5 kudos

Resolved! CSV import error

Upload ErrorError occurred when processing file tips1.csv: [object Object].  I've been trying to import a csv file from my local machine to the databricks. The above mentioned error couldn't be resolved. Anyone pls help me in this regard.

  • 4863 Views
  • 4 replies
  • 5 kudos
Latest Reply
clentin
Contributor
  • 5 kudos

@Kaniz_Fatma - this is now fixed. Thank you so much for your prompt action. Appreciate it. 

  • 5 kudos
3 More Replies
aalanis
by New Contributor II
  • 489 Views
  • 4 replies
  • 2 kudos

Issues reading json files with databricks vs oss pyspark

Hi Everyone, I'm currently developing an application in which I read json files with nested structure. I developed my code locally on my laptop using the opensource version of pyspark (3.5.1) using a similar code to this:sample_schema:schema = Struct...

  • 489 Views
  • 4 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @aalanis, Hi, Thank you for reaching out to our community! We're here to help you. To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedb...

  • 2 kudos
3 More Replies
stefano0929
by New Contributor II
  • 249 Views
  • 1 replies
  • 0 kudos

Error 301 Moved Permanently in cells of plotting

Hi, I created a workbook for academic purposes and had completed it... from one moment to the next all the plot cells of charts (and only those) started returning the following error and I really don't know how to solve it by today.Failed to store th...

  • 249 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @stefano0929, This has been fixed now. Could you please confirm?

  • 0 kudos
Bhabs
by New Contributor
  • 221 Views
  • 2 replies
  • 0 kudos

Replace one tag in a Jason file in the data bricks table .

 There is a column (src_json) in emp_table . I need to replace (ages to age )in each json in the src_json column in emp_table.Can you pls suggest the best way to do it .

  • 221 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Bhabs, Thank you for reaching out to our community! We're here to help you. To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedback no...

  • 0 kudos
1 More Replies
kwinsor5
by New Contributor II
  • 1899 Views
  • 2 replies
  • 0 kudos

Delta Live Table autoloader's inferColumnTypes does not work

I am experimenting with DLTs/Autoloader. I have a simple, flat JSON file that I am attempting to load into a DLT (following this guide) like so:  CREATE OR REFRESH STREAMING LIVE TABLE statistics_live COMMENT "The raw statistics data" TBLPROPERTIES (...

  • 1899 Views
  • 2 replies
  • 0 kudos
Latest Reply
pavlos_skev
New Contributor III
  • 0 kudos

I had the same issue with a similar JSON structure as yours. Adding the option "multiLine" set to true fixed it for me.df = (spark.readStream.format("cloudFiles") .option("multiLine", "true") .option("cloudFiles.schemaLocation", schemaLocation) ...

  • 0 kudos
1 More Replies
Algocrat
by New Contributor II
  • 1681 Views
  • 2 replies
  • 2 kudos

Resolved! Discover and redact pii

Hi! What is the best way to discover and redact pii. Does Databricks offer any frameworks, or set of methods, or processes that we may follow?  

  • 1681 Views
  • 2 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @Algocrat, Thank you for reaching out to our community! We're here to help you. To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedback...

  • 2 kudos
1 More Replies
ashkd7310
by New Contributor II
  • 369 Views
  • 3 replies
  • 4 kudos

date type conversion error

Hello,I am trying to convert the date in MM/dd/yyyy format. So I am first using the date_format function and converting the date into MM/dd/yyyy. So it becomes string. However, my use case is to have the data as date. so I am again converting the str...

  • 369 Views
  • 3 replies
  • 4 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 4 kudos

Hi @ashkd7310, Thank you for reaching out to our community! We're here to help. To ensure we provide you with the best support, could you please take a moment to review the responses and choose the one that best answers your question? Your feedback n...

  • 4 kudos
2 More Replies
kapilb
by New Contributor III
  • 1059 Views
  • 6 replies
  • 2 kudos

Resolved! Regarding problem in accessing table and uploading files

Hello team,I am new to databricks. I am using databricks community edition. Few days back I was able to access my tables and create tables by uploading csv files. But now I am getting error as "File Browsing Error" .It says Workspace is not set in Cu...

  • 1059 Views
  • 6 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @kapilb, This was a global issue and is fixed now.

  • 2 kudos
5 More Replies
AndreasB
by New Contributor II
  • 386 Views
  • 2 replies
  • 1 kudos

Seeing results of materialized views while running notebooks

Hi!My team is currently trying out Delta Live Tables (DLT) for managing our ETL pipelines. An issue we're encountering is that we have notebooks that transform data using Spark SQL. We include these in a DLT pipeline, and we want to both run the pipe...

  • 386 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @AndreasB, The issue arises because DLT requires the use of the LIVE keyword to track dependencies within the pipeline, but this conflicts with running individual notebooks outside the pipeline context. You can continue using your current workarou...

  • 1 kudos
1 More Replies
Splush
by New Contributor II
  • 411 Views
  • 1 replies
  • 1 kudos

Resolved! Row Level Security while streaming data with Materialized views

Hey,I have the following problem when trying to add row level security to some of our Materialized views. According to the documentation this feature is still in preview - nevertheless, I try to understand why this doesnt work and how it would be sup...

  • 411 Views
  • 1 replies
  • 1 kudos
Latest Reply
raphaelblg
Honored Contributor II
  • 1 kudos

Hello @Splush, There are 2 ways to create materialized views at the current moment:1. Through Databricks SQL: Use Materialized Views in Databricks SQL. These are the limitations. 2. Through DLT: Materialized View (DLT). All DLT tables are subject to ...

  • 1 kudos
Giorgi
by Contributor
  • 3563 Views
  • 4 replies
  • 4 kudos

GitLab integration

I've followed instructions and did gitlab integration:Generated Personal Access Token from GitLabAdd token (from step 1) to User settings (GitLab, email, token)In Admin console -> Repos Git URL Allow List permissions: Disabled (no restrictions)In Adm...

  • 3563 Views
  • 4 replies
  • 4 kudos
Latest Reply
joshuat
New Contributor III
  • 4 kudos

Discovered the solution to my problem - Databricks Git integration does not support the use of project-level access tokens. It only supports the use of user-level personal access tokens. When I switched my Git credentials to use a personal access tok...

  • 4 kudos
3 More Replies
robertkoss
by New Contributor III
  • 935 Views
  • 8 replies
  • 1 kudos

Autoloader Schema Hint are not taken into consideration in schema file

I am using Autoloader with Schema Inference to automatically load some data into S3.I have one column that is a Map which is overwhelming Autoloader (it tries to infer it as struct -> creating a struct with all keys as properties), so I just use a sc...

  • 935 Views
  • 8 replies
  • 1 kudos
Latest Reply
Witold
Contributor III
  • 1 kudos

Sorry, I didn't mean that your solution is poorly designed. I was only referring to the one of the main definitions of your bronze layer: You want to have a defined and optimized data layout, which is  source-driven at the same time. In other words: ...

  • 1 kudos
7 More Replies
phanindra
by New Contributor III
  • 641 Views
  • 3 replies
  • 5 kudos

Resolved! Support for Varchar data type

In the official documentation for supported data types, Varchar is not listed. But in the product, we are allowed to create a field of varchar data type. We are building an integration with Databricks and we are confused if we should support operatio...

  • 641 Views
  • 3 replies
  • 5 kudos
Latest Reply
szymon_dybczak
Contributor III
  • 5 kudos

Hi @phanindra ,To be precise, Delta Lake format is based on parquet files. For strings, Parquet only has one data type: StringTypeSo, basically varchar(n) data type under the hood is represented as string with check constraint on the length of the st...

  • 5 kudos
2 More Replies
MichTalebzadeh
by Valued Contributor
  • 214 Views
  • 0 replies
  • 1 kudos

Navigating the Future: Key Questions for Implementing Production-Quality GenAI

"The Big Book of Generative AI from Databricks"https://lnkd.in/dR4VuEyQprovides a comprehensive guide to understanding and implementing Generative AI (GenAI) effectively within an enterprise. Here are some critical questions that businesses should co...

Data Engineering
FinCrime
GenAI ArtificialIntelligence DataScience AI MachineLearning Innovation Technology EnterpriseAI
  • 214 Views
  • 0 replies
  • 1 kudos
sanjay
by Valued Contributor II
  • 10656 Views
  • 13 replies
  • 10 kudos

Spark tasks too slow and not doing parellel processing

Hi,I have spark job which is processing large data set, its taking too long to process the data. In Spark UI, I can see its running 1 tasks out of 9 tasks. Not sure how to run this in parellel. I have already mentioned auto scaling and providing upto...

  • 10656 Views
  • 13 replies
  • 10 kudos
Latest Reply
plondon
New Contributor II
  • 10 kudos

Will it be any different if using Spark but within Azure, i.e. faster? 

  • 10 kudos
12 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels