cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Martinitus
by New Contributor III
  • 5545 Views
  • 4 replies
  • 0 kudos

CSV Reader reads quoted fields inconsistently in last column

I just opened another issue: https://issues.apache.org/jira/browse/SPARK-46959It corrupts data even when read with mode="FAILFAST", i consider it critical, because basic stuff like this  should just work!

  • 5545 Views
  • 4 replies
  • 0 kudos
Latest Reply
Martinitus
New Contributor III
  • 0 kudos

either:  [ 'some text', 'some text"', 'some text"' ]alternatively: [ '"some text"', 'some text"', 'some text"' ]probably most sane behavior would be a parser error ( with mode="FAILFAST").just parsing garbage without warning the user is certainly not...

  • 0 kudos
3 More Replies
tomph
by New Contributor II
  • 1698 Views
  • 2 replies
  • 0 kudos

Resolved! Databricks Asset Bundles - Manage existing jobs

Hello,we are starting to experiment with Databricks Asset Bundles, especially to keep jobs aligned between workspaces. Is there a way to start managing existing jobs, to avoid erasing previous runs history?Thank you,Tommaso

  • 1698 Views
  • 2 replies
  • 0 kudos
Latest Reply
tomph
New Contributor II
  • 0 kudos

Great news, thanks!

  • 0 kudos
1 More Replies
matt_stanford
by New Contributor III
  • 2052 Views
  • 1 replies
  • 0 kudos

Resolved! Type 2 SCD when using Auto Loader

Hi there! I'm pretty new to using Auto Loader, so this may be a really obvious fix, but it's stumped me for a few weeks, so I'm hoping someone can help! I have a small csv file saved in ADLS with a list of pizzas for an imaginary pizza restaurant. I'...

  • 2052 Views
  • 1 replies
  • 0 kudos
Latest Reply
matt_stanford
New Contributor III
  • 0 kudos

So, I figured out what the issue was. I needed to delete checkpoint folder. After I did this and re-ran the notebook, everything worked fine! 

  • 0 kudos
AlexPedurand
by New Contributor
  • 1261 Views
  • 1 replies
  • 0 kudos

SOAP API - Connection

HelloWe have a workflow in our team to perform usual monthly tasks to be ran on the first working day of the month.Each of the ~20 users will run a clone of this workflow most likely all around the same time but with different options. Because we don...

  • 1261 Views
  • 1 replies
  • 0 kudos
Latest Reply
feiyun0112
Honored Contributor
  • 0 kudos

maybe you can set a lock before call SOAP APIpython - Using a Lock with redis-py - Stack Overflow

  • 0 kudos
data1233
by New Contributor
  • 1317 Views
  • 1 replies
  • 0 kudos

create an array sorted by a field

How do i create an array from a field while applying sorting?how do I do this in data brick since databricks does not support order by in array_agg?  same is possible in Snowflake and(Array agg) or Redshift(listagg). SELECT ARRAY_AGG(O_ORDERKEY) WITH...

  • 1317 Views
  • 1 replies
  • 0 kudos
Latest Reply
feiyun0112
Honored Contributor
  • 0 kudos

%sql SELECT array_sort(array_agg(col) ,(left, right) -> CASE WHEN left < right THEN -1 WHEN left > right THEN 1 ELSE 0 END) arr_col FROM VALUES (3), (2), (1) AS tab(col); https://docs.databricks.com/en/sql/language-manual/functions/array_sort.h...

  • 0 kudos
jerryrard
by New Contributor
  • 1446 Views
  • 2 replies
  • 0 kudos

Python Databricks how to run all cells in another notebook except the last cell

I have a Python Databricks notebook which I want to call/run another Databricks notebook using dbutils.notebook.run()... but I want to run all the cells in the "called" notebook except the last one.Is there a way to do a count of cells in the called ...

  • 1446 Views
  • 2 replies
  • 0 kudos
Latest Reply
feiyun0112
Honored Contributor
  • 0 kudos

 In the alternative way, you can use dbutils.notebook.run to pass the parameters, and use dbutils.widgets.get in another notebook to get the parameter values,and determine the parameter values to decide whether to execute codes in the specified cellh...

  • 0 kudos
1 More Replies
Kai
by New Contributor II
  • 2232 Views
  • 1 replies
  • 0 kudos

Resolved! Differences Between "TEMPORARY STREAMING TABLE" and "TEMPORARY STREAMING LIVE VIEW" in DLT

Hello Databricks community,I'm seeking clarification on the distinctions between the following two syntaxes:CREATE OR REFRESH TEMPORARY STREAMING TABLECREATE TEMPORARY STREAMING LIVE VIEWAs of my understanding, both of these methods do not store data...

  • 2232 Views
  • 1 replies
  • 0 kudos
Latest Reply
gabsylvain
Databricks Employee
  • 0 kudos

Hi @Kai, The two syntaxes you're asking about, CREATE OR REFRESH TEMPORARY STREAMING TABLE and CREATE TEMPORARY STREAMING LIVE VIEW, are used in Delta Live Tables and have distinct purposes. CREATE OR REFRESH TEMPORARY STREAMING TABLE: This syntax i...

  • 0 kudos
RyHubb
by New Contributor III
  • 3771 Views
  • 5 replies
  • 0 kudos

Resolved! Databricks asset bundles job and pipeline

Hello, I'm looking to create a job which is linked to a delta live table.  Given the job code like this: my_job_name: name: thejobname schedule: quartz_cron_expression: 56 30 12 * * ? timezone_id: UTC pause_stat...

  • 3771 Views
  • 5 replies
  • 0 kudos
Latest Reply
Yeshwanth
Databricks Employee
  • 0 kudos

@RyHubb  You can specify the variable of the ID and it will be materialized at deploy time. No need to do this yourself. An example is at https://github.com/databricks/bundle-examples/blob/24678f538415ab936e341a04fce207dce91093a8/default_python/...

  • 0 kudos
4 More Replies
ramravi
by Contributor II
  • 12962 Views
  • 1 replies
  • 0 kudos

spark is case sensitive? Spark is not case sensitive by default. If you have same column name in different case (Name, name), if you try to select eit...

spark is case sensitive?Spark is not case sensitive by default. If you have same column name in different case (Name, name), if you try to select either "Name" or "name" column you will get column ambiguity error.There is a way to handle this issue b...

  • 12962 Views
  • 1 replies
  • 0 kudos
Latest Reply
source2sea
Contributor
  • 0 kudos

Hi, even though i set the conf to be true, on writing to disk it had exceptions complaining it has duplicate columns.below is the error message org.apache.spark.sql.AnalysisException: Found duplicate column(s) in the data to save: branchavailablity....

  • 0 kudos
chad_woodhead
by New Contributor
  • 1598 Views
  • 1 replies
  • 0 kudos

Unity Catalog is missing column in Catalog Explorer

I have just altered one of my tables and added a column.ALTER TABLE tpch.customer ADD COLUMN C_CUSTDETAILS struct<key:string,another_key:string,boolean_key:boolean,extra_key:string,int_key:long,nested_object:struct<more:long,arrayOne:array<string>>>A...

chad_woodhead_0-1706220653227.png chad_woodhead_1-1706220693600.png
  • 1598 Views
  • 1 replies
  • 0 kudos
Latest Reply
LaurentLeturgez
Databricks Employee
  • 0 kudos

Hi @chad_woodhead  Weird behavior as I tried to reproduce this into my environment on a delta external table as well as on a managed table. And I'm not able to reproduce what you encountered. Which DBR version are you using ? I used the latest DBR 14...

  • 0 kudos
leaw
by New Contributor III
  • 5040 Views
  • 7 replies
  • 0 kudos

Resolved! How to load xml files with spark-xml ?

Hello,I cannot load xml files.First, I tried to install Maven library com.databricks:spark-xml_2.12:0.14.0 as it is told in documentation, but I could not find it. I only have HyukjinKwon:spark-xml:0.1.1-s_2.10, with this one I have this error: DRIVE...

  • 5040 Views
  • 7 replies
  • 0 kudos
Latest Reply
Frustrated_DE
New Contributor III
  • 0 kudos

Mismatch on Scala version, my bad! Sorted

  • 0 kudos
6 More Replies
rsamant07
by New Contributor III
  • 3765 Views
  • 2 replies
  • 1 kudos

DBT JOBS FAILING

HI ,we have dbt workflow jobs and its been failing randomly from last few days  with below error.  is there any known issue for this , any help on the root cause will be helpful.Encountered an error: Runtime Error Database Error __init__() got an une...

  • 3765 Views
  • 2 replies
  • 1 kudos
Latest Reply
rsamant07
New Contributor III
  • 1 kudos

setting dbt-databricks==1.7.3 solved this issue but now we randomly get the below error . it gets fixed after restrating the cluster sometimes. but is there any permanent solution for this ? from dbt.events import types_pb2 File "/databricks/python3/...

  • 1 kudos
1 More Replies
SivaPK
by New Contributor II
  • 2322 Views
  • 2 replies
  • 0 kudos

Is it possible to share a Dashboard with a user inside , org that doesn't have a Databricks account?

Hello Team,Is it possible to share a Dashboard with a user inside the organization that doesn't have a Data-bricks account?Assign a cluster to one notebook/dashboard and share it inside the organization with a SSO login possibility?Suppose we want to...

Data Engineering
account
sharing
sso
  • 2322 Views
  • 2 replies
  • 0 kudos
Latest Reply
Palash01
Valued Contributor
  • 0 kudos

Hey @SivaPK This was not possible until now, but Databricks released dashborad sharing with people with no workspace access. This constrian was there because dashboards rely on notebooks which require running clusters and accessing data, both of whic...

  • 0 kudos
1 More Replies
Andyt
by New Contributor
  • 1044 Views
  • 1 replies
  • 0 kudos

Restore sql editor

any Options restore sql editors query after workspace was accidentally deleted and restored 

  • 1044 Views
  • 1 replies
  • 0 kudos
Latest Reply
arpit
Databricks Employee
  • 0 kudos

@Andyt If the workspace is accidentally deleted, there is not way to retrieve content from SQL editor.

  • 0 kudos
WhistlePodu
by New Contributor
  • 1507 Views
  • 1 replies
  • 0 kudos

How to get Workflow status and error description programmatically ?

Hi,I want to take some basic info by running workflow and populate a table with those data. I want to add logic programmatically in a notebook and will run it by attaching it in a task of workflow.Information required to be populated in table:Job idJ...

  • 1507 Views
  • 1 replies
  • 0 kudos
Latest Reply
arpit
Databricks Employee
  • 0 kudos

@WhistlePodu You can review the jobs API for getting the other fields like jobs status etc

  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels